SAVOR: Part I

Science des données Avec : Visualisation, Organisation et Reproductibilité — Introduction

Boris Hejblum

June 10, 2025

Modern for Data Science

R started in 1993:

`r 2025 - 1993 =` 32 years ago

A lot has happened since !

  • base vs tidyverse
  • R GUI vs RStudio
  • Rmarkdown, and then Quarto

Data Science

Data Science is an emerging field at the crossing of
Statistics, Computer science & Data analysis

source: R for Data Science (2e), Wickham et al.

Course objectives

  • be able to successfully import and transform data in (%>% & dplyr)
  • be able to choose and implement suitable and beautiful data visualizations (ggplot2)
  • be able to have a reproducible workflow through dynamic reporting
  • understand the difference and commonalities between:
    • software development
    • data analysis

Course organization

Parts:

  1. (brief) recap on basics
  2. Dynamic reproducible reporting with Quarto
  3. Data manipulation with dplyr
  4. Data visualizationwith ggplot2

In each part:

  • some key theoretical concepts
  • practicals exercise to develop your abilities and your autonomy

General advices

  • Google (or any other web search engine) & ChatGPT are your friends !

  • DRY vs WET:

    • Don’t Repeat Yourself !
    • or Write Everything Twice (you have time to spare !)

⇒ use function()

brush-up

  • RStudio: use up-to-date, modern, tools

  • use RStudio projectsalways !!!

live demo

  • loop and functions (DRY)

Brush up practical

open SAVOR_practical1.html and follow along…

tidyverse

Tidy + Universe

Hadley Whickam 🫶

tidyverse: a collection of tidy R packages

🌐👉 www.tidyverse.org

tidy data

  1. each column represent a different variable
  2. each row represent one observation
  3. different observation types are stored in different tables (i.e. data.frame)

⇒ tidyverse: a collection of packages for working with/analyzing tidy data

Other ressources

  • Posit cheat sheets and webinars
  • R for data science (2e), Whickam, Çetinkaya-Rundel & Grolemund 👉🌐
  • What they forgot to teach you about R, Brian & Hester 👉🌐
  • many more